Gene fusion

ABSTRACT

The disclosure provides gene fusion variants and novel associations with disease states, as well as kits, probes, and methods of using the same.

RELATED APPLICATION

This application is a divisional of U.S. application Ser. No. 14/244,824filed Apr. 3, 2014, which claims the benefit under 35 U.S.C. § 119(e) ofU.S. Provisional Application No. 61/809,252 filed on Apr. 5, 2013. Theentire contents of the aforementioned applications are incorporated byreference herein.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in ASCII format and is hereby incorporated byreference in its entirety. Said ASCII copy, created on Jun. 17, 2014, isnamed LT00811_SL.txt and is 2,996 bytes in size.

BACKGROUND

Chromosomal aberrations such as translocations are frequently found inhuman cancer cells. Chromosomal translocations may result in a chimericgene expressing a fusion transcript which is then translated into afusion protein that affects normal regulatory pathways and stimulatescancer cell growth.

The identification of new fusion genes or new variants of known fusiongenes provides an opportunity for additional diagnostics and cancertreatment targets.

BRIEF SUMMARY

The disclosure provides novel gene fusion variants and genefusion-disease state associations. The gene fusions provided herein areassociated with certain cancers. The disclosure further provides probes,such as amplification primer sets and detection probes, as well asmethods of detection, diagnosis, and treatment and kits that include ordetect the gene fusions disclosed herein.

In one embodiment, the disclosure provides a composition and a kitcomprising a set of probes that specifically recognize a gene fusion ofFGFR3 and TACC3. The set of probes can be, for example a set ofamplification primers. In another embodiment, provided herein is acomposition that includes a set of primers that flank an FGFR3 and TACC3breakpoint in a target nucleic acid. The reaction mixture of thisembodiment can further include a detector probe that binds to eitherside of the FGFR3 and TACC3 breakpoint, or that binds a binding regionthat spans the FGFR3 and TACC3 breakpoint. The reaction mixture thatincludes a detector probe or does not include a detector probe, canfurther include a polymerase, dNTPS, and/or a uracil DNA deglycosylase(UDG). The polymerase and UDG are typically not from a human origin. Thereaction mixture can further include a target nucleic acid, for examplea human target nucleic acid. The human target nucleic acid can be, forexample, isolated from a biological sample from a person suspected ofhaving bladder, head and neck, or lung squamous cell carcinoma.

In another embodiment, a set of probes that specifically recognize anucleic acid comprising at least one of SEQ ID NOs: 1-12 is provided. Inanother embodiment, provided herein is a set of primers thatspecifically amplify a target nucleic acid that includes SEQ ID NOs:1-12. In another embodiment, provided herein is a qPCR assay, such as aTaqMan assay or a Molecular Beacons assay, that specifically amplifiesand detects a target nucleic acid that includes SEQ ID NOs: 1-12.

The disclosure also provides an isolated nucleic acid comprising atleast one sequence selected from SEQ ID NOs: 1-12. The isolated nucleicacid can include a first primer on a 5′ end. Furthermore, the nucleicacid can be single stranded or double stranded.

The disclosure, in other embodiments, provides a kit that includes adetector probe and/or a set of probes, for example, a set ofamplification primers that specifically recognize a nucleic acidcomprising an FGFR3 and TACC3 breakpoint. For example, in certainembodiments the detector probe or set of amplification primers aredesigned to amplify and/or detect a nucleic acid that include SEQ IDNOs: 1-12. The kit can further include, in a separate or in the samevessel, a component from an amplification reaction mixture, such as apolymerase, typically not from human origin, dNTPs, and/or UDG.Furthermore, the kit can include a control nucleic acid. For example thecontrol nucleic acid can include a sequence that includes the breakpoint from Table 3.

A method of detecting bladder, head and neck, or lung squamous cellcarcinoma is provided comprising amplifying a nucleic acid that spans anFGFR3 and TACC3 breakpoint, for example the nucleic can include asequence selected from SEQ ID NOs: 1-12, and detecting the presence ofthe nucleic acid, wherein the presence of the nucleic acid indicatesbladder, head and neck, or lung squamous cell carcinoma is present inthe sample. In another method, provided herein is a method of detectingbladder, head and neck, or lung squamous cell carcinoma that includesgenerating an amplicon that includes a sequence selected from SEQ IDNOs: 1-12, and detecting the presence of the nucleic acid, wherein thepresence of the nucleic acid indicates bladder, head and neck, or lungsquamous cell carcinoma is present in the sample. The amplicon typicallyincludes primers that are extended to form the amplicon.

A kit comprising a set of probes, for example, a set of amplificationprimers that specifically recognize a nucleic acid comprising a breakpoint from Table 3 is provided. The kit can further include, in aseparate or in the same vessel, a component from an amplificationreaction mixture, such as a polymerase, typically not from human origin,dNTPs, and/or UDG. Furthermore, the kit can include a control nucleicacid. For example the control nucleic acid can include a sequence thatincludes the break point from Table 3.

In certain embodiments, a set of probes that specifically recognize anucleic acid comprising a break point from Table 3 is provided.

In another embodiment, a gene fusion is provided comprising at least oneof the break points in Table 3.

In another embodiment is a method to detect lung squamous cell carcinomaor thyroid carcinoma in a sample by detecting the presence of aCCDC6/RET gene fusion.

In yet another embodiment, the disclosure provides a method of detectingbreast carcinoma in a sample by detecting the presence of an ERC1/RETgene fusion.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a chart showing the number of samples processed per diseasestate. 4,225 samples were processed across 19 diseases with defuse andtophat gene fusion calling software using cloud-based computing.

FIG. 2 provides 2 gene fusion processing work flows. Gene fusionprocessing produced 4.5 million calls that were filtered and prioritizedto generate a list of high confidence “priority” fusion calls.

FIG. 3 shows gene fusion detection using cluster computing: 4.65 computeyears in 6 days. Three 500 node clusters were used to process samplesusing defuse caller software. In total, 4,225 RNASeq samples wereprocessed generating 28.2 TB of fusion results data.

FIG. 4A shows higher frequency calls across patient populations prior tofiltering.

FIG. 4B is a graph showing recurrent priority gene fusions (FGFR3/TACC3,ALK/EML4, CCD6/RET, TMPRSS2/ERG) post-filtering and their frequencies invarious disease states.

FIG. 5A1-5A2 shows defuse breakpoint calls for each of the 23 fusionpositive patients.

FIG. 5B shows a fused gene product exon map showing exons of each geneupstream and downstream of the breakpoint.

FIG. 6A is a graph of fused and non-fused samples (TMPRSS2 and ERG)exhibited exon expression imbalance prior to and after predicted fusionbreakpoints. 3′ partner genes of fused samples had elevated expressioncompared to non-fused samples.

FIG. 6B is a graph of exon imbalance of ERG and TMPRSS2. 3′ partnerexpression is impacted by the 5′ partner's promoter region, then exonexpression should increase post the predicted breakpoint. This effect isespecially visible when viewing fused versus non-fused patient samples.

FIG. 7 is a graph showing increased expression for both fusion partners(FGFR3 and TACC3) was observed in fused samples versus non-fusedsamples. 9 fused samples were studied across bladder, head and neck, andlung squamous carcinomas.

FIG. 8 shows exon expression imbalance at TACC3, exon 10 observed acrossbladder, head and neck, and lung squamous carcinomas.

FIG. 9 is a graph showing gene fusions involving RET observed withmultiple partners (e.g. CCDC6 and ERC1) across thyroid, lung squamousand breast carcinomas. Breakpoints were observed at similar locationswithin RET and CCDC6 fusion partners.

FIGS. 10 A and B are graphs depicting the wild-type and predicted fusion(B) expression in PRAD (A) of TMPRSS/ERG; and exon expression v. exonboundary with predicted breakpoints in ERG.

FIGS. 11A and 11B are graphs depicting the wild-type and predictedfusion expression in THCA (A) of RET (B); and exon expression v. exonboundary with predicted breakpoints in RET.

FIG. 12 is a RET exon map showing cadherin-like, cysteine rich, andtyrosine kinase domains detailed description

The disclosure provides novel gene fusions and variants, as well asnovel associations of the gene fusions with certain types of cancers.Further provided are probes, assays and kits using the gene fusionsdisclosed herein.

Definitions

The term “marker” or “biomarker” refers to a molecule (typicallyprotein, nucleic acid, carbohydrate, or lipid) that is expressed in thecell, expressed on the surface of a cancer cell or secreted by a cancercell in comparison to a non-cancer cell, and which is useful for thediagnosis of cancer, for providing a prognosis, and for preferentialtargeting of a pharmacological agent to the cancer cell. Oftentimes,such markers are molecules that are overexpressed in a lung cancer orother cancer cell in comparison to a non-cancer cell, for instance,1-fold overexpression, 2-fold overexpression, 3-fold overexpression ormore in comparison to a normal cell. Further, a marker can be a moleculethat is inappropriately synthesized in the cancer cell, for instance, amolecule that contains deletions, additions or mutations in comparisonto the molecule expressed on a normal cell. Alternatively, suchbiomarkers are molecules that are under-expressed in a cancer cell incomparison to a non-cancer cell, for instance, 1-fold underexpression,2-fold underexpression, 3-fold underexpression, or more. Further, amarker can be a molecule that is inappropriately synthesized in cancer,for instance, a molecule that contains deletions, additions or mutationsin comparison to the molecule expressed on a normal cell.

It will be understood by the skilled artisan that markers may be used incombination with other markers or tests for any of the uses, e.g.,prediction, diagnosis, or prognosis of cancer, disclosed herein.

“Biological sample” includes sections of tissues such as biopsy andautopsy samples, and frozen sections taken for histologic purposes.Alternatively, a biological sample can include blood and blood fractionsor products (e.g., serum, plasma, platelets, red blood cells, and thelike), sputum, bronchoalveolar lavage, cultured cells, e.g., primarycultures, explants, and transformed cells, stool, urine, etc. Abiological sample is typically obtained from a eukaryotic organism, mostpreferably a mammal such as a primate e.g., chimpanzee or human; cow;dog; cat; a rodent, e.g., guinea pig, rat, Mouse; rabbit; or a bird;reptile; or fish.

A “biopsy” refers to the process of removing a tissue sample fordiagnostic or prognostic evaluation, and to the tissue specimen itself.Any biopsy technique known in the art can be applied to the diagnosticand prognostic methods of the present invention. The biopsy techniqueapplied will depend on the tissue type to be evaluated (e.g., lungetc.), the size and type of the tumor, among other factors.Representative biopsy techniques include, but are not limited to,excisional biopsy, incisional biopsy, needle biopsy, surgical biopsy,and bone marrow biopsy. An “excisional biopsy” refers to the removal ofan entire tumor mass with a small margin of normal tissue surroundingit. An “incisional biopsy” refers to the removal of a wedge of tissuefrom within the tumor. A diagnosis or prognosis made by endoscopy orradiographic guidance can require a “core-needle biopsy”, or a“fine-needle aspiration biopsy” which generally obtains a suspension ofcells from within a target tissue. Biopsy techniques are discussed, forexample, in Harrison's Principles of Internal Medicine, Kasper, et al.,eds., 16th ed., 2005, Chapter 70, and throughout Part V.

The terms “overexpress,” “overexpression,” or “overexpressed”interchangeably refer to a protein or nucleic acid (RNA) that istranslated or transcribed at a detectably greater level, usually in acancer cell, in comparison to a normal cell. The term includesoverexpression due to transcription, post transcriptional processing,translation, post-translational processing, cellular localization (e.g.,organelle, cytoplasm, nucleus, cell surface), and RNA and proteinstability, as compared to a normal cell. Overexpression can be detectedusing conventional techniques for detecting mRNA (i.e., RT-PCR, PCR,hybridization) or proteins (i.e., ELISA, immunohistochemicaltechniques). Overexpression can be 10%, 20%, 30%, 40%, 50%, 60%, 70%,80%, 90% or more in comparison to a normal cell. In certain instances,overexpression is 1-fold, 2-fold, 3-fold, 4-fold or more, higher levelsof transcription or translation in comparison to a normal cell.

The terms “underexpress,” “underexpression,” or “underexpressed” or“downregulated” interchangeably refer to a protein or nucleic acid thatis translated or transcribed at a detectably lower level in a cancercell, in comparison to a normal cell. The term includes underexpressiondue to transcription, post transcriptional processing, translation,post-translational processing, cellular localization (e.g., organelle,cytoplasm, nucleus, cell surface), and RNA and protein stability, ascompared to a control. Underexpression can be detected usingconventional techniques for detecting mRNA (i.e., RT-PCR, PCR,hybridization) or proteins (i.e., ELISA, immunohistochemicaltechniques). Underexpression can be 10%, 20%, 30%, 40%, 50%, 60%, 70%,80%, 90% or less in comparison to a control. In certain instances,underexpression is 1-fold, 2-fold, 3-fold, 4-fold or more lower levelsof transcription or translation in comparison to a control.

The term “differentially expressed” or “differentially regulated” refersgenerally to a protein or nucleic acid that is overexpressed(upregulated) or underexpressed (downregulated) in one sample comparedto at least one other sample, generally in a cancer patient compared toa sample of non-cancerous tissue in the context of the presentinvention.

The terms “polypeptide,” “peptide” and “protein” are usedinterchangeably herein to refer to a polymer of amino acid residues. Theterms apply to amino acid polymers in which one or more amino acidresidue is an artificial chemical mimetic of a corresponding naturallyoccurring amino acid, as well as to naturally occurring amino acidpolymers and non-naturally occurring amino acid polymer.

The term “amino acid” refers to naturally occurring and synthetic aminoacids, as well as amino acid analogs and amino acid mimetics thatfunction in a manner similar to the naturally occurring amino acids.Naturally occurring amino acids are those encoded by the genetic code,as well as those amino acids that are later modified, e.g.,hydroxyproline, γ-carboxyglutamate, and O-phosphoserine. Amino acidanalogs refers to compounds that have the same basic chemical structureas a naturally occurring amino acid, i.e., an a carbon that is bound toa hydrogen, a carboxyl group, an amino group, and an R group, e.g.,homoserine, norleucine, methionine sulfoxide, methionine methylsulfonium. Such analogs have modified R groups (e.g., norleucine) ormodified peptide backbones, but retain the same basic chemical structureas a naturally occurring amino acid. Amino acid mimetics refers tochemical compounds that have a structure that is different from thegeneral chemical structure of an amino acid, but that functions in amanner similar to a naturally occurring amino acid.

Amino acids may be referred to herein by either their commonly knownthree letter symbols or by the one-letter symbols recommended by theIUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise,may be referred to by their commonly accepted single-letter codes.

As to amino acid sequences, one of skill will recognize that individualsubstitutions, deletions or additions to a nucleic acid, peptide,polypeptide, or protein sequence which alters, adds or deletes a singleamino acid or a small percentage of amino acids in the encoded sequenceis a “conservatively modified variant” where the alteration results inthe substitution of an amino acid with a chemically similar amino acid.Conservative substitution tables providing functionally similar aminoacids are well known in the art. Such conservatively modified variantsare in addition to and do not exclude polymorphic variants, interspecieshomologs, and alleles of the invention.

The following eight groups each contain amino acids that areconservative substitutions for one another: 1) Alanine (A), Glycine (G);2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine(Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L),Methionine (M), Valine (V); 6) Phenylalanine (F), Tyrosine (Y),Tryptophan (W); 7) Serino (S), Threonine (T); and 8) Cysteine (C),Methionine (M). See, e.g., Creighton, Proteins (1984).

The phrase “specifically (or selectively) binds” when referring to aprotein, nucleic acid, antibody, or small molecule compound refers to abinding reaction that is determinative of the presence of the protein ornucleic acid, such as the differentially expressed genes of the presentinvention, often in a heterogeneous population of proteins or nucleicacids and other biologics. In the case of antibodies, under designatedimmunoassay conditions, a specified antibody may bind to a particularprotein at least two times the background and more typically more than10 to 100 times background. Specific binding to an antibody under suchconditions requires an antibody that is selected for its specificity fora particular protein. For example, polyclonal antibodies can be selectedto obtain only those polyclonal antibodies that are specificallyimmunoreactive with the selected antigen and not with other proteins.This selection may be achieved by subtracting out antibodies thatcross-react with other molecules. A variety of immunoassay formats maybe used to select antibodies specifically immunoreactive with aparticular protein. For example, solid-phase ELISA immunoassays areroutinely used to select antibodies specifically immunoreactive with aprotein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual(1988) for a description of immunoassay formats and conditions that canbe used to determine specific immunoreactivity).

The phrase “functional effects” in the context of assays for testingcompounds that modulate a marker protein includes the determination of aparameter that is indirectly or directly under the influence of abiomarker of the invention, e.g., a chemical or phenotypic. A functionaleffect therefore includes ligand binding activity, transcriptionalactivation or repression, the ability of cells to proliferate, theability to migrate, among others. “Functional effects” include in vitro,in vivo, and ex vivo activities.

By “determining the functional effect” is meant assaying for a compoundthat increases or decreases a parameter that is indirectly or directlyunder the influence of a biomarker of the invention, e.g., measuringphysical and chemical or phenotypic effects. Such functional effects canbe measured by any means known to those skilled in the art, e.g.,changes in spectroscopic characteristics (e.g., fluorescence,absorbance, refractive index); hydrodynamic (e.g., shape),chromatographic; or solubility properties for the protein; ligandbinding assays, e.g., binding to antibodies; measuring inducible markersor transcriptional activation of the marker; measuring changes inenzymatic activity; the ability to increase or decrease cellularproliferation, apoptosis, cell cycle arrest, measuring changes in cellsurface markers. The functional effects can be evaluated by many meansknown to those skilled in the art, e.g., microscopy for quantitative orqualitative measures of alterations in morphological features,measurement of changes in RNA or protein levels for other genesexpressed in placental tissue, measurement of RNA stability,identification of downstream or reporter gene expression (CAT,luciferase, β-gal, GFP and the like), e.g., via chemiluminescence,fluorescence, colorimetric reactions, antibody binding, induciblemarkers, etc.

“Inhibitors,” “activators,” and “modulators” of the markers are used torefer to activating, inhibitory, or modulating molecules identifiedusing in vitro and in vivo assays of cancer biomarkers. Inhibitors arecompounds that, e.g., bind to, partially or totally block activity,decrease, prevent, delay activation, inactivate, desensitize, or downregulate the activity or expression of cancer biomarkers. “Activators”are compounds that increase, open, activate, facilitate, enhanceactivation, sensitize, agonize, or up regulate activity of cancerbiomarkers, e.g., agonists. Inhibitors, activators, or modulators alsoinclude genetically modified versions of cancer biomarkers, e.g.,versions with altered activity, as well as naturally occurring andsynthetic ligands, antagonists, agonists, antibodies, peptides, cyclicpeptides, nucleic acids, antisense molecules, ribozymes, RNAi and siRNAmolecules, small organic molecules and the like. Such assays forinhibitors and activators include, e.g., expressing cancer biomarkers invitro, in cells, or cell extracts, applying putative modulatorcompounds, and then determining the functional effects on activity, asdescribed above.

In some embodiments are provided a kit that includes a set of probes. A“probe” or “probes” refers to a polynucleotide that is at least eight(8) nucleotides in length and which forms a hybrid structure with atarget sequence, due to complementarity of at least one sequence in theprobe with a sequence in the target region. The polynucleotide can becomposed of DNA and/or RNA. Probes in certain embodiments, aredetectably labeled, as discussed in more detail herein. Probes can varysignificantly in size. Generally, probes are, for example, at least 8 to15 nucleotides in length. Other probes are, for example, at least 20, 30or 40 nucleotides long. Still other probes are somewhat longer, being atleast, for example, 50, 60, 70, 80, 90 nucleotides long. Yet otherprobes are longer still, and are at least, for example, 100, 150, 200 ormore nucleotides long. Probes can be of any specific length that fallswithin the foregoing ranges as well. Preferably, the probe does notcontain a sequence complementary to the sequence(s) used to prime for atarget sequence during the polymerase chain reaction.

The terms “complementary” or “complementarity” are used in reference topolynucleotides (that is, a sequence of nucleotides) related by thebase-pairing rules. For example, the sequence “A-G-T,” is complementaryto the sequence “T-C-A.” Complementarity may be “partial,” in which onlysome of the nucleic acids' bases are matched according to the basepairing rules. Alternatively, there may be “complete” or “total”complementarity between the nucleic acids. The degree of complementaritybetween nucleic acid strands has significant effects on the efficiencyand strength of hybridization between nucleic acid strands.

“Oligonucleotide” or “polynucleotide” refers to a polymer of asingle-stranded or double-stranded deoxyribonucleotide orribonucleotide, which may be unmodified RNA or DNA or modified RNA orDNA.

“Amplification detection assay” refers to a primer pair and matchedprobe wherein the primer pair flanks a region of a target nucleic acid,typically a target gene, that defines an amplicon, and wherein the probebinds to the amplicon.

A set of probes typically refers to a set of primers, usually primerpairs, and/or detectably-labeled probes that are used to detect thetarget genetic variations. The primer pairs are used in an amplificationreaction to define an amplicon that spans a region for a target geneticvariation for each of the aforementioned genes. The set of amplicons aredetected by a set of matched probes. In an exemplary embodiment, theinvention is a set of TaqMan™ (Roche Molecular Systems, Pleasanton,Calif.) assays that are used to detect a set of target geneticvariations used in the methods of the invention.

In one embodiment, the set of probes are a set of primers used togenerate amplicons that are detected by a nucleic acid sequencingreaction, such as a next generation sequencing reaction. In theseembodiments, for example, AmpliSEQ™ (Life Technologies/Ion Torrent,Carlsbad, Calif.) or TruSEQ™ (Illumina, San Diego, Calif.) technologycan be employed.

A modified ribonucleotide or deoxyribonucleotide refer to molecules thatcan be used in place of naturally occurring bases in nucleic acid andincludes, but is not limited to, modified purines and pyrimidines, minorbases, convertible nucleosides, structural analogs of purines andpyrimidines, labeled, derivatized and modified nucleosides andnucleotides, conjugated nucleosides and nucleotides, sequence modifiers,terminus modifiers, spacer modifiers, and nucleotides with backbonemodifications, including, but not limited to, ribose-modifiednucleotides, phosphoramidates, phosphorothioates, phosphonamidites,methyl phosphonates, methyl phosphoramidites, methyl phosphonamidites,5′-β-cyanoethyl phosphoramidites, methylenephosphonates,phosphorodithioates, peptide nucleic acids, achiral and neutralinternucleotidic linkages.

“Hybridize” or “hybridization” refers to the binding between nucleicacids. The conditions for hybridization can be varied according to thesequence homology of the nucleic acids to be bound. Thus, if thesequence homology between the subject nucleic acids is high, stringentconditions are used. If the sequence homology is low, mild conditionsare used. When the hybridization conditions are stringent, thehybridization specificity increases, and this increase of thehybridization specificity leads to a decrease in the yield ofnon-specific hybridization products. However, under mild hybridizationconditions, the hybridization specificity decreases, and this decreasein the hybridization specificity leads to an increase in the yield ofnon-specific hybridization products.

“Stringent conditions” refers to conditions under which a probe willhybridize to its target subsequence, typically in a complex mixture ofnucleic acids, but to no other sequences. Stringent conditions aresequence-dependent and will be different in different circumstances.Longer sequences hybridize specifically at higher temperatures. Anextensive guide to the hybridization of nucleic acids is found inTijssen, Techniques in Biochemistry and Molecular Biology—Hybridizationwith Nucleic Probes, “Overview of principles of hybridization and thestrategy of nucleic acid assays” (1993). Generally, stringent conditionsare selected to be about 5-10° C. lower than the thermal melting point(T_(m)) for the specific sequence at a defined ionic strength pH. TheT_(m) is the temperature (under defined ionic strength, pH, and nucleicconcentration) at which 50% of the probes complementary to the targethybridize to the target sequence at equilibrium (as the target sequencesare present in excess, at T_(m), 50% of the probes are occupied atequilibrium). Stringent conditions may also be achieved with theaddition of destabilizing agents such as formamide. For selective orspecific hybridization, a positive signal is at least two timesbackground, preferably 10 times background hybridization. Exemplarystringent hybridization conditions can be as following: 50% formamide,5×SSC, and 1% SDS, incubating at 42° C., or, 5×SSC, 1% SDS, incubatingat 65° C., with wash in 0.2×SSC, and 0.1% SDS at 65° C.

Nucleic acids that do not hybridize to each other under stringentconditions are still substantially identical if the polypeptides whichthey encode are substantially identical. This occurs, for example, whena copy of a nucleic acid is created using the maximum codon degeneracypermitted by the genetic code. In such cases, the nucleic acidstypically hybridize under moderately stringent hybridization conditions.Exemplary “moderately stringent hybridization conditions” include ahybridization in a buffer of 40% formamide, 1 M NaCl, 1% SDS at 37° C.,and a wash in 1×SSC at 45° C. A positive hybridization is at least twicebackground. Those of ordinary skill will readily recognize thatalternative hybridization and wash conditions can be utilized to provideconditions of similar stringency. Additional guidelines for determininghybridization parameters are provided in numerous reference, e.g., andCurrent Protocols in Molecular Biology, ed.

Hybridization between nucleic acids can occur between a DNA molecule anda DNA molecule, hybridization between a DNA molecule and a RNA molecule,and hybridization between a RNA molecule and a RNA molecule.

A “mutein” or “variant” refers to a polynucleotide or polypeptide thatdiffers relative to a wild-type or the most prevalent form in apopulation of individuals by the exchange, deletion, or insertion of oneor more nucleotides or amino acids, respectively. The number ofnucleotides or amino acids exchanged, deleted, or inserted can be 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or moresuch as 25, 30, 35, 40, 45 or 50. The term mutein can also encompass atranslocation, for example the fusion of genes encoding the polypeptidesFGFR3/TACC3, RET/CCDC6, RET/ERC1, and/or TMPRss2/ERG.

“Single nucleotide polymorphism” or “SNP” refers to a DNA sequencevariation that occurs when a single nucleotide (A, T, G, or C) in thegenome differs between members of a biological species or pairedchromosomes in a human.

In other embodiments, the two or more probes are primer pairs.

A “primer” or “primer sequence” refers to an oligonucleotide thathybridizes to a target nucleic acid sequence (for example, a DNAtemplate to be amplified) to prime a nucleic acid synthesis reaction.The primer may be a DNA oligonucleotide, a RNA oligonucleotide, or achimeric sequence. The primer may contain natural, synthetic, ormodified nucleotides. Both the upper and lower limits of the length ofthe primer are empirically determined. The lower limit on primer lengthis the minimum length that is required to form a stable duplex uponhybridization with the target nucleic acid under nucleic acidamplification reaction conditions. Very short primers (usually less than3-4 nucleotides long) do not form thermodynamically stable duplexes withtarget nucleic acid under such hybridization conditions. The upper limitis often determined by the possibility of having a duplex formation in aregion other than the pre-determined nucleic acid sequence in the targetnucleic acid. Generally, suitable primer lengths are in the range ofabout 10 to about 40 nucleotides long. In certain embodiments, forexample, a primer can be 10-40, 15-30, or 10-20 nucleotides long. Aprimer is capable of acting as a point of initiation of synthesis on apolynucleotide sequence when placed under appropriate conditions.

The primer will be completely or substantially complementary to a regionof the target polynucleotide sequence to be copied. Therefore, underconditions conducive to hybridization, the primer will anneal to thecomplementary region of the target sequence. Upon addition of suitablereactants, including, but not limited to, a polymerase, nucleotidetriphosphates, etc., the primer is extended by the polymerizing agent toform a copy of the target sequence. The primer may be single-stranded oralternatively may be partially double-stranded.

In some embodiments there is provided a kit encompassing at least 2primer pairs and 2 detectably labeled probes. In these non-limitingembodiments, the 2 primer pairs and/or 2 detectably labeled probes form2 amplification detection assays.

“Detection,” “detectable” and grammatical equivalents thereof refers toways of determining the presence and/or quantity and/or identity of atarget nucleic acid sequence. In some embodiments, detection occursamplifying the target nucleic acid sequence. In other embodiments,sequencing of the target nucleic acid can be characterized as“detecting” the target nucleic acid. A label attached to the probe caninclude any of a variety of different labels known in the art that canbe detected by, for example, chemical or physical means. Labels that canbe attached to probes may include, for example, fluorescent andluminescence materials.

“Amplifying,” “amplification,” and grammatical equivalents thereofrefers to any method by which at least a part of a target nucleic acidsequence is reproduced in a template-dependent manner, including withoutlimitation, a broad range of techniques for amplifying nucleic acidsequences, either linearly or exponentially. Exemplary means forperforming an amplifying step include ligase chain reaction (LCR),ligase detection reaction (LDR), ligation followed by Q-replicaseamplification, PCR, primer extension, strand displacement amplification(SDA), hyperbranched strand displacement amplification, multipledisplacement amplification (MDA), nucleic acid strand-basedamplification (NASBA), two-step multiplexed amplifications, rollingcircle amplification (RCA), recombinase-polymerase amplification(RPA)(TwistDx, Cambridg, UK), and self-sustained sequence replication(3SR), including multiplex versions or combinations thereof, for examplebut not limited to, OLA/PCR, PCR/OLA, LDR/PCR, PCR/PCR/LDR, PCR/LDR,LCR/PCR, PCR/LCR (also known as combined chain reaction-CCR), and thelike. Descriptions of such techniques can be found in, among otherplaces, Sambrook et al. Molecular Cloning, 3^(rd) Edition; Ausbel etal.; PCR Primer: A Laboratory Manual, Diffenbach, Ed., Cold SpringHarbor Press (1995); The Electronic Protocol Book, Chang Bioscience(2002), Msuih et al., J. Clin. Micro. 34:501-07 (1996); The Nucleic AcidProtocols Handbook, R. Rapley, ed., Humana Press, Totowa, N.J. (2002).

Analysis of nucleic acid markers can be performed using techniques knownin the art including, without limitation, sequence analysis, andelectrophoretic analysis. Non-limiting examples of sequence analysisinclude Maxam-Gilbert sequencing, Sanger sequencing, capillary array DNAsequencing, thermal cycle sequencing (Sears et al., Biotechniques,13:626-633 (1992)), solid-phase sequencing (Zimmerman et al., MethodsMol. Cell Biol., 3:39-42 (1992)), sequencing with mass spectrometry suchas matrix-assisted laser desorption/ionization time-of-flight massspectrometry (MALDI-TOF/MS; Fu et al., Nat. Biotechnol., 16:381-384(1998)), and sequencing by hybridization. Chee et al., Science,274:610-614 (1996); Drmanac et al., Science, 260:1649-1652 (1993);Drmanac et al., Nat. Biotechnol., 16:54-58 (1998). Non-limiting examplesof electrophoretic analysis include slab gel electrophoresis such asagarose or polyacrylamide gel electrophoresis, capillaryelectrophoresis, and denaturing gradient gel electrophoresis.Additionally, next generation sequencing methods can be performed usingcommercially available kits and instruments from companies such as theLife Technologies/Ion Torrent PGM or Proton, the Illumina HiSEQ orMiSEQ, and the Roche/454 next generation sequencing system.

In some embodiments, the amount of probe that gives a fluorescent signalin response to an excited light typically relates to the amount ofnucleic acid produced in the amplification reaction. Thus, in someembodiments, the amount of fluorescent signal is related to the amountof product created in the amplification reaction. In such embodiments,one can therefore measure the amount of amplification product bymeasuring the intensity of the fluorescent signal from the fluorescentindicator.

“Detectably labeled probe” or “detector probe” refers to a molecule usedin an amplification reaction, typically for quantitative or real-timePCR analysis, as well as end-point analysis. Such detector probes can beused to monitor the amplification of the target nucleic acid sequence.In some embodiments, detector probes present in an amplificationreaction are suitable for monitoring the amount of amplicon(s) producedas a function of time. Such detector probes include, but are not limitedto, the 5′-exonuclease assay (TAQMAN® probes described herein (see alsoU.S. Pat. No. 5,538,848) various stem-loop molecular beacons (see forexample, U.S. Pat. Nos. 6,103,476 and 5,925,517 and Tyagi and Kramer,1996, Nature Biotechnology 14:303-308), stemless or linear beacons (see,e.g., WO 99/21881), PNA Molecular Beacons™ (see, e.g., U.S. Pat. Nos.6,355,421 and 6,593,091), linear PNA beacons (see, for example, Kubistaet al., 2001, SPIE 4264:53-58), non-FRET probes (see, for example, U.S.Pat. No. 6,150,097), Sunrise®/Amplifluor™ probes (U.S. Pat. No.6,548,250), stem-loop and duplex Scorpion probes (Solinas et al., 2001,Nucleic Acids Research 29: E96 and U.S. Pat. No. 6,589,743), bulge loopprobes (U.S. Pat. No. 6,590,091), pseudo knot probes (U.S. Pat. No.6,589,250), cyclicons (U.S. Pat. No. 6,383,752), MGB Eclipse™ probe(Epoch Biosciences), hairpin probes (U.S. Pat. No. 6,596,490), peptidenucleic acid (PNA) light-up probes, self-assembled nanoparticle probes,and ferrocene-modified probes described, for example, in U.S. Pat. No.6,485,901; Mhlanga et al., 2001, Methods 25:463-471; Whitcombe et al.,1999, Nature Biotechnology. 17:804-807; Isacsson et al., 2000, MolecularCell Probes. 14:321-328; Svanvik et al., 2000, Anal Biochem. 281:26-35;Wolffs et al., 2001, Biotechniques 766:769-771; Tsourkas et al., 2002,Nucleic Acids Research. 30:4208-4215; Riccelli et al., 2002, NucleicAcids Research 30:4088-4093; Zhang et al., 2002 Shanghai. 34:329-332;Maxwell et al., 2002, J. Am. Chem. Soc. 124:9606-9612; Broude et al.,2002, Trends Biotechnol. 20:249-56; Huang et al., 2002, Chem. Res.Toxicol. 15:118-126; and Yu et al., 2001, J. Am. Chem. Soc14:11155-11161.

Detector probes can also include quenchers, including without limitationblack hole quenchers (Biosearch), Iowa Black (IDT), QSY quencher(Molecular Probes), and Dabsyl and Dabcel sulfonate/carboxylateQuenchers (Epoch).

Detector probes can also include two probes, wherein for example a fluoris on one probe, and a quencher is on the other probe, whereinhybridization of the two probes together on a target quenches thesignal, or wherein hybridization on the target alters the signalsignature via a change in fluorescence. Detector probes can alsocomprise sulfonate derivatives of fluorescenin dyes with SO₃ instead ofthe carboxylate group, phosphoramidite forms of fluorescein,phosphoramidite forms of CY 5 (commercially available for example fromAmersham). In some embodiments, interchelating labels are used such asethidium bromide, SYBR® Green I (Molecular Probes), and PicoGreen®(Molecular Probes), thereby allowing visualization in real-time, or endpoint, of an amplification product in the absence of a detector probe.In some embodiments, real-time visualization can comprise both anintercalating detector probe and a sequence-based detector probe can beemployed. In some embodiments, the detector probe is at least partiallyquenched when not hybridized to a complementary sequence in theamplification reaction, and is at least partially unquenched whenhybridized to a complementary sequence in the amplification reaction. Insome embodiments, the detector probes of the present teachings have a Tmof 63-69° C., though it will be appreciated that guided by the presentteachings routine experimentation can result in detector probes withother Tms. In some embodiments, probes can further comprise variousmodifications such as a minor groove binder (see for example U.S. Pat.No. 6,486,308) to further provide desirable thermodynamiccharacteristics.

In some embodiments, detection can occur through any of a variety ofmobility dependent analytical techniques based on differential rates ofmigration between different analyte species. Exemplarymobility-dependent analysis techniques include electrophoresis,chromatography, mass spectroscopy, sedimentation, for example, gradientcentrifugation, field-flow fractionation, multi-stage extractiontechniques, and the like. In some embodiments, mobility probes can behybridized to amplification products, and the identity of the targetnucleic acid sequence determined via a mobility dependent analysistechnique of the eluted mobility probes, as described for example inPublished P.C.T. Application WO04/46344 to Rosenblum et al., andWO01/92579 to Wenz et al. In some embodiments, detection can be achievedby various microarrays and related software such as the AppliedBiosystems Array System with the Applied Biosystems 1700Chemiluminescent Microarray Analyzer and other commercially availablearray systems available from Affymetrix, Agilent, Illumina, and AmershamBiosciences, among others (see also Gerry et al., J. Mol. Biol.292:251-62, 1999; De Bellis et al., Minerva Biotec 14:247-52, 2002; andStears et al., Nat. Med. 9:14045, including supplements, 2003). It willalso be appreciated that detection can comprise reporter groups that areincorporated into the reaction products, either as part of labeledprimers or due to the incorporation of labeled dNTPs during anamplification, or attached to reaction products, for example but notlimited to, via hybridization tag complements comprising reporter groupsor via linker arms that are integral or attached to reaction products.Detection of unlabeled reaction products, for example using massspectrometry, is also within the scope of the current teachings.

The kits of the present invention may also comprise instructions forperforming one or more methods described herein and/or a description ofone or more compositions or reagents described herein. Instructionsand/or descriptions may be in printed form and may be included in a kitinsert. A kit also may include a written description of an Internetlocation that provides such instructions or descriptions.

A subject sample can be any bodily tissue or fluid that includes nucleicacids from the subject. In certain embodiments, the sample will be ablood sample comprising circulating tumor cells or cell free DNA. Inother embodiments, the sample can be a tissue, such as a canceroustissue. The cancerous tissue can be from a tumor tissue and may be freshfrozen or formalin-fixed, paraffin-embedded (FFPE).

As used herein, BLCA=bladder carcinoma, BRCA=breast carcinoma,CESC=cervical cell carcinoma, COAD=colon adenocarcinoma,GBM=glioblastoma multiforme, HNSC=head and neck squamous cell carcinoma,KIRK=clear cell renal cell carcinoma, KIRP=kidney renal papillary cellcarcinoma, LAML=acute myeloid leukemia, LGG=brain lower grade glioma,LIHC=liver hepatocellular carcinoma, LUAD=lung adenocarcinoma,LUSC=squamous cell lung carcinoma, OV=ovarian serous adenocarcinoma,PRAD=prostate adenocarcinoma, READ=rectal adenocarcinoma, SKCM=cutaneousmelanoma, STAD=stomach adenocarcinoma, THCA=thyroid carcinoma, andUCEC=uterine corpus endometrioid carcinoma.

The disclosure provides novel associations and variants (ie, varyingbreakpoint locations on one or both of the partner genes) of genefusions such as TMPRSS2/ERG, FGFR3/TACC3, RET/CCDC6 and RET/ERC1. Thedisclosure contemplates, isolated nucleic acid sequences of the genefusions and sequences complementary thereto, amplicons, transcripts, aswell as probes that specifically recognize the nucleic acid sequences ofthe gene fusions, sequences complementary thereto, amplicons, andtranscripts.

In certain embodiments, the disclosure provides a set of probes thatspecifically recognize one or more of the gene fusions disclosed herein.

In some embodiments, the kits and assays comprise probes thatspecifically recognize a target, such as a gene fusion nucleic acidsequence.

In another embodiment, the disclosure provides diagnostics and treatmenttargets utilizing the disclosed gene fusions. The gene fusions andassociated disease states provide targets for both diagnosis andtreatment. For instance, the presence, absence, or increased ordecreased expression of a gene fusion target can be used to diagnose adisease state. Likewise, the gene fusions can be used for targetedtherapies.

Methods of diagnosing, treating, and detecting gene fusions andassociated disease are further contemplated herein.

Transmembrane protease, serine 2 is an enzyme encoded by the TMPRSS2gene. The TMPRSS2 protein's function in prostate cancer relies onover-expression of ETS transcription factors, such as erg, through genefusion. Although the TMPRSS2/ERG gene fusion is known to be associatedwith prostate cancer, the present disclosure provides numerous novelbreakpoints in each gene resulting in novel forms of the fusion. SeeFIG. 5. Knowing the exon location of the breakpoint in each gene isuseful for detecting a gene fusion. However, the breakpoint locations ineach gene (see Gene A breakpoint and Gene B breakpoint in FIG. 5A) canbe used to precisely target one or more breakpoints, for primer, probeand assay design. The breakpoint variants for TMPRSS2/ERG are shown inTable 3.

Exon expression of a gene fusion is altered after the breakpoint. Thereis an imbalance in exon expression before and after the breakpoint. Thisis shown for example, for the TMPRSS2/ERG fusion in FIG. 6. Exonimbalance is associated with other gene fusions disclosed herein (seealso FIGS. 8 and 9).

FGFR3/TACC3 is a fusion of fibroblast growth factor receptor (FGFR3) andtransforming acidic coiled-coil (TACC3) coding domains. The FGFR3/TACC3fusion protein displays oncogenic activity. Although FGFR3/TACC3 hasbeen previously reported, provided herein, are numerous variations ofthe FGFR3/TACC3 gene fusion in which the break points differ. Table 1shows the locations for each break point in both FGFR3 and TACC3 andTable 2 provides the break point sequences (see SEQ ID NOs: 1-12).

The FGFR3/TACC3 fusion is known to be associated with humanglioblastoma. However, as shown herein, FGFR3/TACC3 is also associatedwith bladder, head and neck and lung squamous cell cancers. FIGS. 4 and7 demonstrate the presence as well as an upregulation of the FGFR3/TACC3fusion in each of these disease states. In addition, exon expressionimbalance was observed at TACC3 exon 10. It is noteworthy that thetyrosine kinase domain of FGFR3 and the TACC domain of TACC3 arepreserved in the fusion. These domains are necessary for gene fusionfunction.

RET is a proto-oncogene located on chromosome 10 and has 21 exons. RETfusions are known in various disease states. RET fusions involvemultiple partners and vary according to disease state. As shown in FIG.9, CCDC6 and ERC1 are fusion partners of RET. In samples taken frombreast, lung and thyroid cancer, RET was present in each disease state.However, only CCDC6/RET fusions were found in lung and thyroid cancersamples, whereas ERC1/RET fusions were found in breast cancer samples.In all fusions, the RET tyrosin kinase domain was preserved. Table 4shows the breakpoints for each gene in the CCDC6/RET and ERC1/RET fusionvariants.

In certain embodiments, assays and methods of detection are provided.Methods for detecting gene fusions provided herein are known in the art.As non-limiting examples, such assays can include 5′ nuclease PCR assays(Applied Biosystems, Foster City, Calif.) or microarray assays (Skotheimet al., Molecular Cancer 2009, 8:5).

TaqMan Gene Expression Assays can be designed for a set of known fusiontranscripts for quantitative analysis. Such assays are designed suchthat the primers and probe span the breakpoint region but are not placeddirectly on the breakpoint.

Computer Implemented Systems

Computer systems can be utilized to in certain embodiments of thedisclosure. In various embodiments, computer system can include a bus orother communication mechanism for communicating information, and aprocessor coupled with bus for processing information. In variousembodiments, computer system can also include a memory, which can be arandom access memory (RAM) or other dynamic storage device, coupled tobus for determining base calls, and instructions to be executed byprocessor. Memory also can be used for storing temporary variables orother intermediate information during execution of instructions to beexecuted by processor. In various embodiments, computer system canfurther include a read only memory (ROM) or other static storage devicecoupled to bus for storing static information and instructions forprocessor. A storage device, such as a magnetic disk or optical disk,can be provided and coupled to bus for storing information andinstructions.

In various embodiments, computer system can be coupled via bus to adisplay, such as a cathode ray tube (CRT) or liquid crystal display(LCD), for displaying information to a computer user. An input device,including alphanumeric and other keys, can be coupled to bus forcommunicating information and command selections to processor. Anothertype of user input device is a cursor control, such as a mouse, atrackball or cursor direction keys for communicating directioninformation and command selections to processor and for controllingcursor movement on display. This input device typically has two degreesof freedom in two axes, a first axis (i.e., x) and a second axis (i.e.,y), that allows the device to specify positions in a plane.

A computer system can perform the present teachings. Consistent withcertain implementations of the present teachings, results can beprovided by computer system 100 in response to processor executing oneor more sequences of one or more instructions contained in memory. Suchinstructions can be read into memory from another computer-readablemedium, such as storage device. Execution of the sequences ofinstructions contained in memory can cause processor to perform theprocesses described herein. Alternatively hard-wired circuitry can beused in place of or in combination with software instructions toimplement the present teachings. Thus implementations of the presentteachings are not limited to any specific combination of hardwarecircuitry and software.

In various embodiments, the term “computer-readable medium” as usedherein refers to any media that participates in providing instructionsto processor for execution. Such a medium can take many forms, includingbut not limited to, non-volatile media, volatile media, and transmissionmedia. Examples of non-volatile media can include, but are not limitedto, optical or magnetic disks, such as storage device. Examples ofvolatile media can include, but are not limited to, dynamic memory, suchas memory. Examples of transmission media can include, but are notlimited to, coaxial cables, copper wire, and fiber optics, including thewires that comprise bus.

Common forms of non-transitory computer-readable media include, forexample, a floppy disk, a flexible disk, hard disk, magnetic tape, orany other magnetic medium, a CD-ROM, any other optical medium, punchcards, paper tape, any other physical medium with patterns of holes, aRAM, PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge,or any other tangible medium from which a computer can read.

In accordance with various embodiments, instructions configured to beexecuted by a processor to perform a method are stored on acomputer-readable medium. The computer-readable medium can be a devicethat stores digital information. For example, a computer-readable mediumincludes a compact disc read-only memory (CD-ROM) as is known in the artfor storing software. The computer-readable medium is accessed by aprocessor suitable for executing instructions configured to be executed.

In accordance with the teachings and principles embodied in thisapplication, methods, systems, and computer readable media that canefficiently collect, analyze, store, transfer, retrieve, and/ordistribute information across multiple sites and/or entities, includinggenomic and/or patient information, are provided.

In one embodiment, a system is provided for determining whether one ormore gene fusion and/or variant is present in a sample. The system canfurther determine identify a disease state, such as cancer, associatedwith the one or more gene fusion and/or gene variant, as well as anappropriate treatment in accordance with the mutation status. In certainembodiments, the system comprises a processor in communication with asequencing instrument that receives sequencing data.

In some embodiments, the processor can execute one or more variantcalls. In some embodiments, the processor can provide, filter, and/orannotate predictions.

EXAMPLES

A 5′ nuclease assay (i.e. TaqMan assay) is designed to detect a fusionof genes FGFR3 and TACC3 using commercial assay design services (AppliedBiosystems, Foster City, Calif.). (See “Gene Expression AssayPerformance Guaranteed With the TaqMan® Assays QPCR Guarantee Program,”White Paper available from Applied Biosystems/Life Technologies, FosterCity, Calif.). For the assay design, a first primer binds to a firstprimer binding region of a target nucleic acid, a detector probe bindsto a detector probe binding region of the target nucleic acid on oneside of a fusion transcript breakpoint that is found on the targetnucleic acid. A second primer binds to a second primer binding region onthe other side of the fusion transcript breakpoint. The transcriptbreakpoint region (˜10 bp), SNPs and repetitive sequences are maskedbefore the assay is designed using the Applied Biosystems custom assaydesign service (Applied Biosystems, Foster City, Calif.). The first andsecond primers and detection probe are used in a 5′ nuclease assay usingstandard master mix formulations and cycling conditions on commerciallyavailable real-time thermocyclers (Applied Biosystems, Foster City,Calif.).

Similar methods are used to detect TMPRSS2/ERG, CCDC6/RET, ERC1/RET,comprising varying breakpoints, including the breakpoints disclosedherein. Tables 1-4 show the breakpoints for each of these fusions.

Unless otherwise indicated, all numbers expressing quantities ofingredients, properties such as molecular weight, reaction conditions,and so forth used in the specification and claims are to be understoodas being modified in all instances by the term “about.” Accordingly,unless indicated to the contrary, the numerical parameters set forth inthe specification and attached claims are approximations that may varydepending upon the desired properties sought to be obtained by thepresent invention. At the very least, and not as an attempt to limit theapplication of the doctrine of equivalents to the scope of the claims,each numerical parameter should at least be construed in light of thenumber of reported significant digits and by applying ordinary roundingtechniques. Notwithstanding that the numerical ranges and parameterssetting forth the broad scope of the invention are approximations, thenumerical values set forth in the specific examples are reported asprecisely as possible. Any numerical value, however, inherently containscertain errors necessarily resulting from the standard deviation foundin their respective testing measurements.

The terms “a,” “an,” “the” and similar referents used in the context ofdescribing the invention (especially in the context of the followingclaims) are to be construed to cover both the singular and the plural,unless otherwise indicated herein or clearly contradicted by context.Recitation of ranges of values herein is merely intended to serve as ashorthand method of referring individually to each separate valuefalling within the range. Unless otherwise indicated herein, eachindividual value is incorporated into the specification as if it wereindividually recited herein. All methods described herein can beperformed in any suitable order unless otherwise indicated herein orotherwise clearly contradicted by context. The use of any and allexamples, or exemplary language (e.g., “such as”) provided herein isintended merely to better illuminate the invention and does not pose alimitation on the scope of the invention otherwise claimed. No languagein the specification should be construed as indicating any non-claimedelement essential to the practice of the invention.

Groupings of alternative elements or embodiments of the inventiondisclosed herein are not to be construed as limitations. Each groupmember may be referred to and claimed individually or in any combinationwith other members of the group or other elements found herein. It isanticipated that one or more members of a group may be included in, ordeleted from, a group for reasons of convenience and/or patentability.When any such inclusion or deletion occurs, the specification is deemedto contain the group as modified thus fulfilling the written descriptionof all Markush groups used in the appended claims.

Certain embodiments of this invention are described herein, includingthe best mode known to the inventors for carrying out the invention. Ofcourse, variations on these described embodiments will become apparentto those of ordinary skill in the art upon reading the foregoingdescription. The inventor expects skilled artisans to employ suchvariations as appropriate, and the inventors intend for the invention tobe practiced otherwise than specifically described herein. Accordingly,this invention includes all modifications and equivalents of the subjectmatter recited in the claims appended hereto as permitted by applicablelaw. Moreover, any combination of the above-described elements in allpossible variations thereof is encompassed by the invention unlessotherwise indicated herein or otherwise clearly contradicted by context.

Specific embodiments disclosed herein may be further limited in theclaims using consisting of or consisting essentially of language. Whenused in the claims, whether as filed or added per amendment, thetransition term “consisting of” excludes any element, step, oringredient not specified in the claims. The transition term “consistingessentially of” limits the scope of a claim to the specified materialsor steps and those that do not materially affect the basic and novelcharacteristic(s). Embodiments of the invention so claimed areinherently or expressly described and enabled herein.

Furthermore, numerous references have been made to patents and printedpublications throughout this specification. Each of the above-citedreferences and printed publications are individually incorporated hereinby reference in their entirety.

In closing, it is to be understood that the embodiments of the inventiondisclosed herein are illustrative of the principles of the presentinvention. Other modifications that may be employed are within the scopeof the invention. Thus, by way of example, but not of limitation,alternative configurations of the present invention may be utilized inaccordance with the teachings herein. Accordingly, the present inventionis not limited to that precisely as shown and described.

Specific embodiments disclosed herein may be further limited in theclaims using consisting of or consisting essentially of language. Whenused in the claims, whether as filed or added per amendment, thetransition term “consisting of” excludes any element, step, oringredient not specified in the claims. The transition term “consistingessentially of” limits the scope of a claim to the specified materialsor steps and those that do not materially affect the basic and novelcharacteristic(s). Embodiments of the invention so claimed areinherently or expressly described and enabled herein.

APPENDIX 1

TABLE 1 GeneA GeneB TCGA Gene GeneA GeneA Gene GeneB Spanning DiseaseSymbol Chromosome Breakpoint Symbol Breakpoint Reads Fusion TypeDistance LUSC FGFR3 chr4 1808574 TACC3 1741459 18 Intrachromosomal 67115GBM FGFR3 chr4 1808661 TACC3 1739325 502 Intrachromosomal 69336 GBMFGFR3 chr4 1808661 TACC3 1741429 4408 Intrachromosomal 67232 BLCA FGFR3chr4 1808661 TACC3 1741429 935 Intrachromosomal 67232 HNSC FGFR3 chr41808661 TACC3 1741429 81 Intrachromosomal 67232 BLCA FGFR3 chr4 1808661TACC3 1741429 1796 Intrachromosomal 67232 LUSC FGFR3 chr4 1808853 TACC31737034 227 Intrachromosomal 71819 HNSC FGFR3 chr4 1808661 TACC3 17393251242 Intrachromosomal 69336 BLCA FGFR3 chr4 1808924 TACC3 1739413 11Intrachromosomal 69511 LUSC FGFR3 chr4 1808661 TACC3 1741429 3785Intrachromosomal 67232 LUSC FGFR3 chr4 1808652 TACC3 1737485 661Intrachromosomal 71167 LUSC FGFR3 chr4 1808651 TACC3 1737484 369Intrachromosomal 71167

TABLE 2 SEQ TCGA ID Disease FGFR3/TACC3 Breakpoint Sequence NO: LUSCTGGACAAGCCCGCCAACTGCACACACGACCTGTACATGATCATGCGGGAG|CTGAGGAGCAGGTGTGAGG 1AGCTCCACGGGAAGAACCTGGAACTGGGGAA GBMGCAGCTGGTGGAGGACCTGGACCGTGTCCTTACCGTGACGTCCACCGACG|TGCCAGGCCCACCCCCAGG 2TGTTCCCGCGCCTGGGGGCCCACCCCTGTCC GBMGCAGCTGGTGGAGGACCTGGACCGTGTCCTTACCGTGACGTCCACCGACG|TAAAGGCGACACAGGAGGA 3GAACCGGGAGCTGAGGAGCAGGTGTGAGGAG BLCAAGCAGCTGGTGGAGGACCTGGACCGTGTCCTTACCGTGACGTCCACCGAC|GTAAAGGCGACACAGGAGG 4AGAACCGGGAGCTGAGGAGCAGGTGTGAGGA HNSCAGCAGCTGGTGGAGGACCTGGACCGTGTCCTTACCGTGACGTCCACCGAC|GTAAAGGCGACACAGGAGG 5AGAACCGGGAGCTGAGGAGCAGGTGTGAGGA BLCAAGCAGCTGGTGGAGGACCTGGACCGTGTCCTTACCGTGACGTCCACCGAC|GTAAAGGCGACACAGGAGG 6AGAACCGGGAGCTGAGGAGCAGGTGTGAGGA LUSCAGGACTGCTTCCTCAAGGCCGACTCCTTAAACGAGGAAGTTCCAAACTGC|TCCAGGTACTCCTGCTGGCG7 GGAGGCGGGGTGAGCGCTGTGCCACCGAGC HNSCAGCAGCTGGTGGAGGACCTGGACCGTGTCCTTACCGTGACGTCCACCGAC|GTGCCAGGCCCA 8CCCCCAGGTGTTCCCGCGCCTGGGGGCCCACCCCTGTC BLCAGTACTCCCCGGGTGGCCAGGACACCCCCAGCTCCAGCTCCTCAGGGGACG|AGGACCTGGATG 9CAGTGGTAAAGGCGACACAGGAGGAGAACCGGGAGCTG LUSCGCAGCTGGTGGAGGACCTGGACCGTGTCCTTACCGTGACGTCCACCGACG|TAAAGGCGACAC 10AGGAGGAGAACCGGGAGCTGAGGAGCAGGTGTGAGGAG LUSCCCACCTTCAAGCAGCTGGTGGAGGACCTGGACCGTGTCCTTACCGTGACG|TCCTTATACCTCA 11AGTTCGACCCCCTCCTGAGGGACAGTCCTGGTAGACC LUSCGTCTACCAGGACTGTCCCTCAGGAGGGGGTCGAACTTGAGGTATAAGGAC|GTCACGGTAAGGA 12CACGGTCCAGGTCCTCCACCAGCTGCTTGAAGGTGGG

TABLE 3 GeneA GeneB Gene GeneA GeneA Gene GeneB GeneB Spanning SymbolChromosome Breakpoint Symbol Chromosome Breakpoint Reads TMPRSS2 chr2142,870,046 ERG chr21 39,817,544 61 TMPRSS2 chr21 42,879,877 ERG chr2139,956,869 17 TMPRSS2 chr21 42,880,008 ERG chr21 39,817,544 5 TMPRSS2chr21 42,870,046 ERG chr21 39,817,544 75 TMPRSS2 chr21 42,879,877 ERGchr21 39,817,544 30 TMPRSS2 chr21 42,879,877 ERG chr21 39,817,544 10TMPRSS2 chr21 42,870,046 ERG chr21 39,817,544 198 TMPRSS2 chr2142,870,046 ERG chr21 39,817,544 52 TMPRSS2 chr21 42,880,008 ERG chr2139,817,544 8 TMPRSS2 chr21 42,870,046 ERG chr21 39,817,544 188 TMPRSS2chr21 42,879,891 ERG chr21 39,817,518 23 TMPRSS2 chr21 42,870,046 ERGchr21 39,817,544 59 TMPRSS2 chr21 42,866,283 ERG chr21 39,817,544 54TMPRSS2 chr21 42,879,884 ERG chr21 39,870,288 23 TMPRSS2 chr2142,880,008 ERG chr21 39,817,544 5 TMPRSS2 chr21 42,879,877 ERG chr2139,817,544 9 TMPRSS2 chr21 42,879,877 ERG chr21 39,956,869 25 TMPRSS2chr21 42,866,406 ERG chr21 39,817,479 6 TMPRSS2 chr21 42,879,891 ERGchr21 39,817,518 30 TMPRSS2 chr21 42,870,046 ERG chr21 39,817,544 42TMPRSS2 chr21 42,870,046 ERG chr21 39,817,544 15 TMPRSS2 chr2142,870,046 ERG chr21 39,817,544 214 TMPRSS2 chr21 42,879,877 ERG chr2139,817,544 33

TABLE 4 GeneA GeneB TCGA Gene GeneA GeneA Gene GeneB GeneB SpanningDisease Symbol Chromosome Breakpoint Symbol Chromosome Breakpoint ReadsTHCA CCDC6 chr10 61,665,917 RET chr10 43,610,148 59 THCA CCDC6 chr1061,665,897 RET chr10 43,610,044 16 THCA CCDC6 chr10 61,665,897 RET chr1043,610,044 102 THCA CCDC6 chr10 61,612,343 RET chr10 43,612,054 71 LUADCCDC6 chr10 61,665,880 RET chr10 43,612,032 17 THCA CCDC6 chr1061,554,235 RET chr10 43,610,029 65 LUAD CCDC6 chr10 61,665,880 RET chr1043,612,032 8 THCA CCDC6 chr10 61,665,880 RET chr10 43,610,035 52 THCACCDC6 chr10 61,665,897 RET chr10 43,610,044 85 THCA CCDC6 chr1061,665,897 RET chr10 43,610,044 79 THCA CCDC6 chr10 61,665,880 RET chr1043,612,032 71 THCA CCDC6 chr10 61,612,343 RET chr10 43,612,054 65 BRCAERC1 chr12  1,250,953 RET chr10 43,612,032 17 THCA CCDC6 chr1061,612,343 RET chr10 43,612,054 11 THCA CCDC6 chr10 61,665,897 RET chr1043,610,044 117

1-8. (canceled)
 9. A method of detecting lung squamous cell carcinoma orthyroid carcinoma in a human tissue or blood sample, the methodcomprising: amplifying a CCDC6/RET gene fusion or fusion gene productusing a set of probes that specifically recognize at least one nucleicacid in the CCDC6/RET gene or fusion gene product of Table 4; anddetecting the presence of the CCDC6/RET gene fusion or fusion geneproduct in the sample; wherein detecting the presence of the CCDC6/RETfusion or fusion gene product, indicates lung squamous cell carcinoma orthyroid carcinoma is present in the sample.
 10. A method of detectingbreast carcinoma in a human tissue or blood sample, the methodcomprising: amplifying a ERC1/RET gene fusion using a set of probes thatspecifically recognize at least one nucleic acid in the ERC1/RET fusiongene or a fusion gene product of Table 4; and detecting the presence ofthe ERC1/RET gene fusion or fusion gene product in the sample; whereindetecting the presence of the ERC1/RET fusion gene or fusion geneproduct, indicates breast carcinoma is present in the sample. 11-12.(canceled)
 13. The method of claim 9, wherein the sample is a patienttumor sample.
 14. The method of claim 13, further comprising diagnosingthe patient as having bladder carcinoma, head and neck squamous cellcarcinoma, or lung squamous cell carcinoma when a nucleic acidcomprising a sequence selected from SEQ ID NOs: 1-12 is present in thepatient sample.
 15. The method of claim 13, further comprisingdiagnosing the patient as having breast carcinoma when the ERC1/RETfusion gene or fusion gene product is present in the sample.
 16. Themethod of claim 13, further comprising diagnosing the patient as havingthyroid carcinoma when the CCDC6/RET fusion gene or fusion gene productis present in the sample.
 17. The method of claim 10, wherein the sampleis a patient tumor sample.
 18. The method of claim 10, wherein thedetecting is by sequencing of the amplified nucleic acid.
 19. A methodof detecting a TMPRSS2/ERG gene fusion comprising a breakpoint of Table3 in a human tissue or blood sample from a patient having prostatecancer, the method comprising: generating a reaction mixture comprisinga plurality of primers that specifically hybridize to a target nucleicacid comprising one of the TMPRSS2/ERG breakpoints of Table 3 andnucleic acid from the human tissue or blood sample wherein the samplecomprises one or more target nucleic acid(s); amplifying the targetnucleic acid(s) using the plurality of primers, thereby producingamplicons; sequencing the amplicons; and detecting the presence of aTMPRSS2/ERG gene fusion comprising a breakpoint of Table
 3. 20. Themethod of claim 19, wherein the sample is a patient tumor sample. 21.The method of claim 19, wherein the sequencing is by next generationsequencing technology.